NAIST at the NLI 2013 Shared Task
نویسندگان
چکیده
This paper describes the Nara Institute of Science and Technology (NAIST) native language identification (NLI) system in the NLI 2013 Shared Task. We apply feature selection using a measure based on frequency for the closed track and try Capping and Sampling data methods for the open tracks. Our system ranked ninth in the closed track, third in open track 1 and fourth in open track 2.
منابع مشابه
Using Other Learner Corpora in the 2013 NLI Shared Task
Our efforts in the 2013 NLI shared task focused on the potential benefits of external corpora. We show that including training data from multiple corpora is highly effective at robust, cross-corpus NLI (i.e. open-training task 1), particularly when some form of domain adaptation is also applied. This method can also be used to boost performance even when training data from the same corpus is av...
متن کاملNAIST at 2013 CoNLL Grammatical Error Correction Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) error correction system in the CoNLL 2013 Shared Task. We constructed three systems: a system based on the Treelet Language Model for verb form and subjectverb agreement errors; a classifier trained on both learner and native corpora for noun number errors; a statistical machine translation (SMT)-based model for prepositi...
متن کاملVTEX System Description for the NLI 2013 Shared Task
This paper describes the system developed for the NLI 2013 Shared Task, requiring to identify a writer’s native language by some text written in English. I explore the given manually annotated data using word features such as the length, endings and character trigrams. Furthermore, I employ k-NN classification. Modified TFIDF is used to generate a stop-word list automatically. The distance betw...
متن کاملMaximizing Classification Accuracy in Native Language Identification
This paper reports our contribution to the 2013 NLI Shared Task. The purpose of the task was to train a machine-learning system to identify the native-language affiliations of 1,100 texts written in English by nonnative speakers as part of a high-stakes test of general academic English proficiency. We trained our system on the new TOEFL11 corpus, which includes 11,000 essays written by nonnativ...
متن کاملA Report on the 2017 Native Language Identification Shared Task
Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2...
متن کامل